Robust Acoustic Speech Emotion Recognition by Ensembles of Classifiers
نویسندگان
چکیده
Automatic speech recognition can fail to a certain extent when confronted with emotionally distorted speech. Great efforts have been spent so far to cope with noise conditions or speaker’s characteristics. Yet, adaptation to the emotional condition of the speaker could help to further improve the overall performance. In this respect we aim at a robust and reliable recognition of the speaker’s emotional state by acoustic features only prior to speech recognition itself. Thereby we can load according emotional speech models. In this work we introduce an optimal feature set for this task selected by Sequential Floating Search Methods. The set comprises high-level prosodic features resembling utterancewise statistic analysis of low-level contours as pitch, higherorder formants, energy, and spectral development. Within classification we apply ensemble classification as Stacking, Bagging, and Boosting.
منابع مشابه
Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles
Herein we present a comparison of novel concepts for a robust fusion of prosodic and verbal cues in speech emotion recognition. Thereby 276 acoustic features are extracted out of a spoken phrase. For linguistic content analysis we use the Bag-of-Words text representation. This allows for integration of acoustic and linguistic features within one vector prior to a final classification. Extensive...
متن کاملRobust Recognition of Emotion from Speech
This paper presents robust recognition of selected emotions from salient spoken words. The prosodic and acoustic features were used to extract the intonation patterns and correlates of emotion from speech samples in order to develop and evaluate models of emotion. The computed features are projected using a combination of linear projection techniques for compact and clustered representation of ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملBimodal Emotion Recognition from Speech and Text
This paper presents an approach to emotion recognition from speech signals and textual content. In the analysis of speech signals, thirty-seven acoustic features are extracted from the speech input. Two different classifiers Support Vector Machines (SVMs) and BP neural network are adopted to classify the emotional states. In text analysis, we use the two-step classification method to recognize ...
متن کامل